
Hello! 

My name is Sergiy Migdalskiy and I work at Valve with Dirk Gregorius. I helped port, 
develop and optimize our in-house physics engine, Rubikon. Also, I have implemented a 
physics engine for Uncharted: Drake's Fortune on PS/3. 



While this talk is about physics development, I'd like to emphacise that tools and techniques 
I'm describing are general purpose. They are probably applicable to any complex system 
written in C++. 


Visualization 



Run Piaying_with_Physics.mov 
This is what I do the whole day long. 

I play with physics, throw ragdolls into sandboxes, and such silliness. 

(wait) Here I'm piling up some ragdolls, it's looking good, until.. 

I notice one of the ragdolls, the very first, doesn't seem to collide with anything. 

I'm testing some more. I make sure the other ragdolls work fine. Then I notice something. 
Here it is - the collision flags on the very first ragdoll are set to not collide with anything. 
Now the mystery is solved, and I go on to other tests. 



Physics + Qt 



Fast Dev Cycle 



We started working with Dirk by porting his rigid body engine to Valve's framework. I 
knew Qt and I like playing with stuff I'm working on, for fun and to make it easier. 
So I implemented a little IDE for our physics project. 

It was incremental. Every time something took more than 30 seconds seconds to 
look up in the debugger, I made a widget that makes it a split-second. 

We had physics running in an isolated tool with buttons, sliders and 3D window with 
shadows, the one you've just seen, which was pretty cool. And it restarted in 
seconds. We call it the Physics Testbed, or the Physics Debugger. Not to be confused 
with Visual Studio debugger. 



Physics + Game = Slow Dev Cycle 



But there comes the time for every game physics engine to be put into an actual 
game. 

Once we started integrating physics with the game, we had to debug it inside the 
game. So we lost our fast-iteration, nice Ul. 

Every engine has in-game visualization of physics. So we make do. 

It has some serious drawbacks, though. 

. You can't easily use mouse cursor for clicking collision or Ul. 

. You cannot fly around easily, you have to implement a special camera for it. 

. You need to render inside game engine, which can be pretty tricky. 

. You have to implement a special pause that pauses absolutely everything 

while you're examining something. 

• It's non-trivial to implement rich Ul, like you can do with Qt. 



Physics Ul 



Physics Testbed was a great Ul with a bunch physics-centric widgets. It was a great 
help already in making physics work. I felt I can reuse it in the game integrated 
physics. 

But how do we display in-game physics world in a separate window? 




Extra Window 



I briefly entertained an idea to open an extra Qt window in the game process. I find 
it: 

• inconvenient to implement 

. It's intrusive to the main game loop 

. does not work when the game is crashed or stopped on a breakpoint 

• You have to implement “full pause” in the game, both on server and client 
So I decided against in-process window. This means I could just reuse the existing 

separate tool, our Physics Testbed. 





Out-of-Game Visualization 


Game Physics World 



Serialized Bit Stream 


10010110010110... 



To visualize in external process, you need to send data to it. 

The hardest part about sending physics data to external app is serialization. 

But if you can properly and fully serialize your world state into a stream of bits and 
send it over the wire, it becomes very straightforward to visualize your physics in 
a separate app. 

Playing with physics in external app is a nice bonus feature. 




Serialization 



So serialization is the biggest obstacle here. But once we solve it, it becomes very 
useful: 

- we can save and look at something later 

- we can implement nice data and time profilers 

- we can record and replay bugs 

Once you can serialize, it is trivial to stream to external app. 


(Un)Serialization 



By the way, of course, serialization works two ways. The external app has to 
somehow turn our stream of bits into data structures.. 

So How do we serialize? Physics engine is complex. There are so many classes and 
data structures in there! Serializing pointers is hard. How do we build software 
that doesn’t miss anything, and then reconstructs everything faithfully. And 
hopefully it updates itself every time we add or remove a field in a class? 

Definitely not! by writing serialization by hand. 



We can analyze the engine's header files to generate serialization code. 

That's what we would do if we wrote the serialization by hand. 

But instead of doing it by hand we can parse C++ in a utility and spit out C++ which 
will serialize our data. 

But parsing C++ is extremely hard. Or is it? 



Clang is an open-source C++ compiler. In some ways it's a direct competitor of 
GCC, but with much more readable source code and liberal license. 

Clang makes C++ parsing a piece of cake. It's written in C++ itself, and it just 
works. 

We can use all our existing code as input, convert cpp and header files into 
easy-to-use AST and spit out the serializer routines. 

One bonus from using Clang is this improved C++ compliance of our code. We 
found some bugs, and prepared the code for Linux. It's really useful to compile 
your code with another compiler. 


Clang AST 


Source code 


class CRnCapsul eShape : 
public CRnShape 

pub! ic: 

private: 

Vector m_vcenter[ 2 ]; 
float m_flRadius; 
AUTO_SERIALIZE; 


Abstract Syntax Tree 



I mentioned the term AST. It means Abstract Syntax Tree. 

It's a tree that describes everything there is to know about your source code. 
This is how it looks, schematically. For every class, you'll have a node. For every 
member, its type or size, or annotation, or anything you can think of you'll 
probably get a node of some type. 
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Code Generation 


C++ 


Clang 


C++ 


#include 

class CRncapsuleShape : 
public CRnShape 

public: 

private: 

vector m_vCenter[ 2 ]; 
float m_flRadius; 
AUTO_SERIALIZE; 


void CRncapsuleShape: :Serialize( 

CRnSerial izer *pout ) const 

CRnShape: :Serialize( pOut ); 
pOut->writeBuiltin<float>( m_flRadius ); 
for( int nElement = 0; 

nElement < 2; 
nElement ) 

{ 

::Serialize( pout, 

m_vcenter [nElement] ); 


In effect, we are transforming C++ code into additional C++ code 

It's very tempting at first to put code generation as a bunch of printf() statement 

right into the your parser 

I decided to resist the urge, and I consider myself lucky I did. It would be 
horribly slow to iterate on the generated code with printfs. 


The Structured Approach 


C++ 


Clang 


json 


#i ncl ude 

class CRnCapsuleShape : 
public CRnShape 

{ 

public: 

private: 

vector m_vcenter[ 2 ]; 
float m_flRadius; 
AUTO_SERIALIZE; 

}; 


"CRnCapsul eShape" : { 
"fields" : { 

"m_flRadius" : { 


"typeName" : 

} i 

"m_vCenter" : { 

"float" , 

"className" : 

"Vector" , 

"arraysize" : 
}, } 

"bases" : [ 

"CRnShape" 

}, ] 

2, 


Parse, then Generate. Decouple these two stages, and unless your project is trivial, 
you'll be rewarded. 

I'm distilling the huge AST of the physics engine into a very concise json file. Json is 
a human readable format that is widely used nowadays. 

After adding features for a year, my Visitor class in the code that uses Clang that just 
prints out the json with relevant information is about 1000 lines long, and it would 
be incomprehensible if I generated code there. 


The Structured Approach 


C++ 


Clang 


json 


#i ncl ude 


"CRnCapsuleShape" : { 

"fields" : { 

"m_fl Radi us" : { 

"typeName" : "float", 

class CRnCapsuleShape 


{ 





"className" : "Vector", 

pri vate : 


"arraySize" : 2, 



}, 

}; 


"CRnShape" 

}, 


Parse, then Generate. Decouple these two stages, and unless your project is trivial, 
you'll be rewarded. 

I'm distilling the huge AST of the physics engine into a very concise json file. Json is 
a human readable format that is widely used nowadays. 

After adding features for a year, my Visitor class in the code that uses Clang that just 
prints out the json with relevant information is about 1000 lines long, and it would 
be incomprehensible if I generated code there. 


The Structured Approach 


C++ 


Clang 


json 


#i ncl ude 


"CRnCapsuleShape" : { 
"fields" : { 

"m_flRadius" : { 

class CRnCapsuleShape 


{ 


}, V 

public: 


private: 



» 

m_flRadius; 


}, 

AUTO_SERIALIZE; 

}; 




}, 


Parse, then Generate. Decouple these two stages, and unless your project is trivial, 
you'll be rewarded. 

I'm distilling the huge AST of the physics engine into a very concise json file. Json is 
a human readable format that is widely used nowadays. 

After adding features for a year, my Visitor class in the code that uses Clang that just 
prints out the json with relevant information is about 1000 lines long, and it would 
be incomprehensible if I generated code there. 


The Structured Approach 


C++ 


Clang 


json 


#i ncl ude 

class CRnCapsuleShape 
{ 

public: 


"CRnCapsuleShape" : { 

"fields" : { 

"m_flRadius" : { 

^typeName" : "float", 

private: 



» 

float m_flRadius; 
AUTO_SERIALIZE; 

}; 


}, 



}, 


Parse, then Generate. Decouple these two stages, and unless your project is trivial, 
you'll be rewarded. 

I'm distilling the huge AST of the physics engine into a very concise json file. Json is 
a human readable format that is widely used nowadays. 

After adding features for a year, my Visitor class in the code that uses Clang that just 
prints out the json with relevant information is about 1000 lines long, and it would 
be incomprehensible if I generated code there. 


The Structured Approach 


C++ 


Clang 


json 


#i ncl ude 


"CRnCapsuleShape" : { 
"fields" : { 

"m_flRadius" : { 

class CRnCapsuleShape 


{ 


"typeName" : "float", 

public: 


"m_vCenter" : { 

private: 



m vcenter ; 


} 

float m_flRadius; 
AUTO_SERIALIZE; 

}; 


}, 



}, 


Parse, then Generate. Decouple these two stages, and unless your project is trivial, 
you'll be rewarded. 

I'm distilling the huge AST of the physics engine into a very concise json file. Json is 
a human readable format that is widely used nowadays. 

After adding features for a year, my Visitor class in the code that uses Clang that just 
prints out the json with relevant information is about 1000 lines long, and it would 
be incomprehensible if I generated code there. 
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The Structured Approach 


C++ 


Clang 


json 


#i ncl ude 

class CRnCapsul eshape 
{ 

public: 

private: 

vector m_vcenter 
float m_flRadius; 
AUTO_SERIALIZE; 

}; 


"CRnCapsul eShape" : { 
"fields" : { 

"m_fl Radius" : { 
"typeName" : 


"m_vCenter" : { 
"className" 


"float" , 
"Vector" , 


}, 


} 


}, 


Parse, then Generate. Decouple these two stages, and unless your project is trivial, 
you'll be rewarded. 

I'm distilling the huge AST of the physics engine into a very concise json file. Json is 
a human readable format that is widely used nowadays. 

After adding features for a year, my Visitor class in the code that uses Clang that just 
prints out the json with relevant information is about 1000 lines long, and it would 
be incomprehensible if I generated code there. 


The Structured Approach 


C++ 


Clang 


json 


#i ncl ude 

class CRnCapsuleShape 
{ 

public: 


"CRnCapsuleShape" : { 
"fields" : { 

"m_flRadius" : { 
"typeName" : 

} l 

"m_vCenter" : { 

"float" , 



"cl assName" 

: "Vector", 

private: 

Vector m_vcenter[ 2 ]; 
float m_flRadius; 
AUTO_SERIALIZE; 

}; 


"arraySize" 
}, } 

}, 

: 2, 


Parse, then Generate. Decouple these two stages, and unless your project is trivial, 
you'll be rewarded. 

I'm distilling the huge AST of the physics engine into a very concise json file. Json is 
a human readable format that is widely used nowadays. 

After adding features for a year, my Visitor class in the code that uses Clang that just 
prints out the json with relevant information is about 1000 lines long, and it would 
be incomprehensible if I generated code there. 
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The Structured Approach 


C++ 


Clang 


json 


#i ncl ude 

class CRnCapsuleShape : 
public CRnShape 

{ 

public: 

private: 

vector m_vcenter[ 2 ]; 
float m_flRadius; 
AUTO_SERIALIZE; 

}; 


"CRnCapsul eShape" : { 
"fields" : { 

"m_flRadius" : { 


"typeName" : 

} i 

"m_vCenter" : { 

"float" , 

"className" : 

"Vector" , 

"arraysize" : 
}, } 

"bases" : [ 

"CRnShape" 

}, ] 

2, 


Parse, then Generate. Decouple these two stages, and unless your project is trivial, 
you'll be rewarded. 

I'm distilling the huge AST of the physics engine into a very concise json file. Json is 
a human readable format that is widely used nowadays. 

After adding features for a year, my Visitor class in the code that uses Clang that just 
prints out the json with relevant information is about 1000 lines long, and it would 
be incomprehensible if I generated code there. 


Json? 


C++ 

- 

Clang 

— 

json 

— 


C++ 


So, I'm taking my C++ code, let Clang parse it and spit out Json. 

But I ultimately need my serializer and potentially other useful tools to be written in C++ 

This begs the question: how do I generate C++ code from Json? 


StringTemplate (to the rescue)! 


C++ 

- 

Clang 

— 

json 

— 


StringTemplate — C++ 


Fortunately, there is no need to reinvent the wheel, because it was invented many times over 
before us. There are a lot of text template libraries, and languages. They are widely used and 
freely available. 

I chose StringTemplate. It's a text library that's very easy to use, and it's very well suited to 
code generation. It's declarative style, which is appealing for this kind of task. 


StringTemplate (to the rescue)! 


C++ 

- 

Clang 

— 

json 

— 


XML 


StringTemplate ^ |C++ 

XSLT 


SQL PHP 


You could use something else. In fact, if you're very familiar and comfortable with, say, PHP, 
you should absolutely use it. 

You could also use XML instead of Json. In fact, standard Clang tools can output full abstract 
syntax tree in XML directly. XSLT is a powerful transformation language for XML. 
StringTemplates and Json are much easier to learn and much more human-readable. That's 
why I personally chose them. 

There are many ways to configure this pipeline, I'm just describing what worked for me. 


StringTemplate 


C++ 


Clang — 

— ► json 

— ► StringTemplate 


C++ 


void <name>: : Serial ize( 

CRnSerializer *pOut ) const 

<cl ass. bases : 

{b | <b>: : Serial ize( pout ); }> 

<cl ass. fields. keys, class. fields. values: {k, 
v | <field( name = k, props = v )> }> 

} 


StringTemplate looks like this. The tool itself is written in java, but there's no need to know 
Java to use it. All my serialization, unserialization and memory statistics code is generated 
from a 400-line template. These are a few of those lines. They auto-generate the code to 
serialize most classes in our physics engine. 

You feed it json file, it spits out C++ file. Quite simple. 


Code Generation: Sources 


C++ 


Clang 


json 


StringTemplate 


C++ 


#i ncl ude " . . . " 

class CRnCapsul eShape : 
public CRnShape 

publ ic: 

private: 

Vector m_vCenter[ 2 ]; 
float m_flRadius; 
AUTO_SERIALIZE; 


So, to recap. 

We start with the source code of the engine - it's arbitrary C++, potentially annotated. For 
instance, you can annotate only the classes you want serialized. And exclude some fields 
from serialization. And add a member to call after serialization. 


Code Generation: Parser 


C++ 


Clang 


json 


StringTemplate 


C++ 


cl ang : : CXXMethodDecl 
{ 


FindMethodswithNameC const clang: :CXXRecordDecl *pRecord, 

const char *pName ) 


for( auto itMethod = pRecord->method_begin() , 
itMethodEnd = pRecord->method_end() ; 
itMethod != itMethodEnd; ++itMethod ) 

{ 

i f C clang: :ldentifierlnfo *pldlnfo = ( *itMethod )->getldenti fier() ) 
i f C pldlnfo->getName() == pName ) 
return *itMethod; 

} 

} 

} 

return NULL; 


The parser/Clang part is moderately complicated. I wrote the first working version 
within a couple of weeks, working on it part-time. But then I fiddled with it for well over 
a year. It handles all the special and edge cases we have in our engine. It's not a 
magic bullet, though. It will not parse anything and everything in the known universe, 
but it'll parse and describe any exotic data structure in our engine. I have to change it 
only very rarely. In fact, the last change I made was very minor. And it was over 4 
months ago. 

This slide shows how to look for a method with a given name in class. It's pretty 
straightforward to extract information once you start writing the code. 

The best source of information for me was the online Clang docs, especially their 
class diagram. It's very logical and educational. Also, Clang is very readable, so a lot 
of stuff becomes clearer after you debug it. 


Code Generation: Parser 


C++ 


Clang 


json 


StringTemplate 


"CRnCapsuleShape" : { 
"fields" : { 

"m_flRadius" : { 


"typeName" : 
} » 

"m_vcenter" : { 
"className" 

"float" , 

"Vector" , 

"arraysize" 

}, 

"bases" : [ 
"CRnShape" 

>, ] 

2, 


C++ 


The generated json file looks like this. I already explained the meaning of this snippet 
before. 


Generated Code 


C++ 


Clang 


json 


void <name>: :Serialize( 

CRnSerializer *pOut ) const 

<cl ass. bases : 

{b | <b>: : Serial ize( pout ); }> 

<cl ass . f i el ds . keys , cl ass . f i el ds . val ues : 
{ k , v | <field( name = k, props = v )> }> 


StringTemplate 


C++ 


void CRnCapsuleShape: :Serialize( 
CRnSerializer *pOut ) const 

CRnShape: :Serialize( pout ); 
pOut->WriteBuiltin<float>( m_flRadius ); 
for( int nElement = 0; 

nElement < 2; 
nElement ) 


{ 


::Serialize( pout, 

m_vcenter [nElement] ); 


To remind, this is what the string template looks like, and this here is the resulting 
generated C++ code 

The template goes through all data members of the class and generates code for 
each of them. 

The template is smart enough to know where I have arrays, templates, pointers to 
follow and so on. 

All the non-plain data elements are recursively serialized by other overloaded 
serialization functions. Some of them methods, some of them free functions. I 
especially prefer free functions for simpler types, containers and third party classes 
where I don’t want to touch the header where those classes are declared. 



Once you complete these 5 easy steps, you'll have yourself one wholesome serializer. It will 
be self-maintaining. You will forever be able to visualize your physics in all detail. 

Sometimes we don't even know something is wrong. Some bugs ship and go unnoticed for 
years. This will never happen to you again. 

Well..., I can't make guarantees... But I really hope I convinced you that visualizing stuff is 
great. Having serializer auto-magically generate itself makes everything even better. 

But we have quite a bit more to discuss before we wrap up. 
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Record and Replay 



=Play= Top_HeavyJ_amp.wmv 

Here's an example of one workflow when you have full serialization. 

I noticed something weird with a lamp here. It was always falling down. Lamps are 
supposed to stand straight, and this one was always on its side. 

So I snooped and looked at the lamp. 

(wait) When I figured out why it's not standing (because its center of mass is too 
high), I just sent out an email to an artists - and I attached the file he can play 
with to prove my words. It's quite intuitive. 




Clang's “Hello World!” 


See ClangCheck . cpp 

class CDumpAction: public clang: :ASTConsumer 

11 vm: : raw_fd_ostream &m_log; 
public: 

CDumpAction( 11 vm: : raw_fd_ostream &fd ):m_log( fd ) 


virtual bool HandleTopLevelDecl ( clang: :DeclGroupRef DG ) 

for ( clang: :DeclGroupRef: iterator it = DG.beginO, 
itEnd = DG.endO; it != itEnd; ++it ) 

{ 

clang: :Decl *pDecl = *it; 
pDecl ->dumpXML( m_log ); 

return true; 


Here is the Clang's “Hello World” program relevant to this use case is called ClangCheck and it is a 
great starting point to write your own tool. It's part of the Clang distro, and I urge you to have a 
look at it. 

It uses visitor pattern. For every top-level declaration in your source file, such as a class or function 
definition, or a typedef, or a global, you get a callback. In that callback, you can drill down the 
AST. 

This self-contained example is dumping those declarations as XML to a log. If XML is your thing and 
performance is not the most critical issue, this here just may be your parser. 
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Typically, all our class declarations are contained in headers. 

It's enough to parse one .cpp file that includes all the relevant headers for serialization. 
There's no need to parse the whole engine, although there may be other useful tools that 
would need it. 

To parse just one file, you'll need to create a compilation database file with cpp file name 
and compile options. 


Clang Compilation Database 


[ 

{ 

"directory": , 

"command": "clang -fsyntax-only -fms-extensions -fms-compatibility -I. ./public/include 
rnserial ize.cpp" , 

"f i 1 e" : " rnseri al i ze . cpp 1 ’ 


Here's an example of compilation database. The most interesting options are highlighted 
here. 

Compilation database is like a makefile, or vcproj. If you don't want to do anything non- 
standard, you can simply generate a .json file with all the information and give it to clang. 

We use a tool to generate VCPROJ files, and we changed that tool to also spit out 
compilation databases. 


Clang Bootstrap Cheatsheet 


class MyAction: public clang: : ASTConsumer 
public: 

virtual bool HandleTopLevelDecl ( clang: :DeclGroupRef dg ) 

for ( clang: :DeclGroupRef: :iterator 

it = DG.beginO, itEnd = DG.endO; it != itEnd; ++it ) 

clang: :CXXRecordDecl *pRecord = 

llvm: :dyn_cast<clang: :CXXRecordDecl>( *it ); 

// do something with it... 

return true; 

} 

}; 


class MyFactory 
public: 

clang: :ASTConsumer 

*newASTConsumer() 

{ 

return 

new MyAction( ); 

} } 




pCompilationDb = clang: :tooling: : JSONCompilationDatabase: :loadFromFile( strJsonDb, errorMessage ); 
files = plsonDb->getAll Fi 1 es() ; 

clang: :tooling: :ClangTool Tool ( *pCompilationDb, files ); 

MyFactory factoryC pisonDbPath ); 

int nError = Tool.run( clang: :tooling: : newFrontendActionFactoryC &factory ) ); 
llvm: :errs() .flush() ; 
llvm: :outs() .flush Q ; 



And here is a little Cheatsheet. In your parser tool, that you carefully link with all the clang 
libraries, what do you do first? 

Start with implementing ASTConsumer, make a factory class for it, and then in your main() 
function, load compilation database and then run an instance ClangTool. It's all here in 
this slide. 

One thing to remember: you may have to flush std out and std error. 


Box Cast 



Before going further, I'd like to clarify what I mean by box cast. 
We often use it in Source engine. It's like raycast, but with a box. 


Contact 



Just take a box, move it linearly until you hit something. 

At that point, you sometimes create a contact point for the solver. 

We use it for player movement a lot, so debugging it is a common thing to do. 


Visualizing Physics In Game 



=Run= PlayerJVIovement.mov 

Let's see what we've got for our efforts so far. 

Here's a game running alongside visualization tool. 

I'm walking around a room, and I'm looking at the raycasts and box casts our player 
movement is making 

I can look at it from the 3 rd person view, I can only render the relevant data. 

I'm noticing something strange, there's a flurry of box casts sometimes. It might 
probably result in a framerate hitch. 

So I'm trying to reproduce this. 

I find a place where it's happening. 

And I'm flying around to see what causes it. 

So far, so good. By the way, I can play with that physics at any point, but it won't 
affect the game, so it's not useful in this case. 
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When That Is Not Enough 


wfl*' * * 








Well, what if just looking at the problem does not present an obvious fix? 
What if we have to actually debug code? 



At least when you debug your own code, it's easy to see the parts. The components. 
What they are supposed to do. 

It's easy to check and see if they are actually doing it. 
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Working in a team, it's different... It's hard to see the whole picture. It's hard to check 
if the parts are working properly. 



Even when I find a bug in foreign code, there are usually many ways to fix it and I'm 
never quite sure which one is the best. 

Or maybe I misunderstood the code and my fix will cause another bug. 


45 


Tools 



And the debugging tools we have are very basic: We have the Watch window, and 
printf. That's just inadequate. 



And now we have this wonderful application.. I wonder if we could use it when we step 
through the breakpoints just like we use it when we walk through the game. 

We need to leverage it! But how? 



Debug Experience 


« m pPartidaj.14 0*0000000011266300 (po»«( 687629443, 6769.97656, 3< 

• * |01 pos=< 6876.29443, 6769.97656, 364 S33264) *=( 0.231891751 

- : pos>l 685443750. 669925000. 336452576) *-(0231891751 

' • |2) pos=( 6866.72S10 6739.01172. 3S2239136) *=(0231891751. 

• -|31 pos-( 6896 30566, 672100684, 412660950) *•( 0.621897459 

• * |4| pos=( 6903.90430. -6718.92432, 419238647) *=(-0.795030117 

• -15! pos.( 6892.56396, 672399951. 408815186) *•( 0.576003432 

' • 16) po*=( 6877,71094. 6774.56104, 366.353394) *=(-0231891751 

• -|71 pos=( 6865.03955, 6733.55664, 350073364) <=(0231891751. 

poi=| 6915.52539. 6715.27734. 420 790619) <=( 0 745445311 

• - 19] pos=( 6862.48145, 6725.27881, 346786865) *=(0231891751 

• • |10t po»=( 6925.2163L 671195996, 417.005035) *=( 0.674079120 

■ > |11| pos=( 6882.99365. 679165576 373.140533) x=( 0.231891751 

• * |12| po*=( 6887.22559 6805.36816 378 716858) *•( 0.231891751 

• • |13| pos=| 6899.58838. 6845.37256 394 600006) *=l 0.231891751 

- pSimulattonWoddT ramform* 14 0*0000000057960340 |po«=( 6904.11133. 671047217, 325.63 

• |0| pos=( 690421133. -6710.47217. 325.638428) *=(0.772891462 

• • m*tn*3*4_t pos-( 6904.11133, 671047217, 325638428) *.(0772891462 

- - m AMsIVal 0*0000000057960340 |0*00000000579603a0 10 772891462 0. 

po»=( 6901.78857. 671184180, 349.343170) *=(-0.021016858 
poi=( 6896.30273. 6697.04053, 322.380066) *=(0372540712, 
pos=( 6907.72754. 672545947, 321857361) *=( 0086953304 
pos=( 6902.04639. 6711.83691, 364.934204) *=(0.0402162559 
pos=( 6900 .82715. 671228955, 380.599396) *=(00578S26743 
pos=( 6900,0371 L 671420801. 391.881500) <=(0610547960 
pos=( 6892.98975. 6711,08350. 381.304016) *=(0299724787, 
pos=( 6899.02979, 6720.03516, 380205017) *. (0.5995 34094. 
pos=( 6890.17285. 6721.56641, 397.602539) *=(0 798212171, 
po»=( 687902051 671674902. 393418640) <=(0 136251017. 
poj=( 6883.49463, 6730.69482, 392.627655) *=(0.263435274, 
pos*( 6892 62842 672141406. 408810089) <«( 0.579993010 
poi=( 6884.S2002 6723.33447, 396.799957) *=(0256185055. 


m ObugNam* 
m.p World 
•*m,pFaModal 
■» m.HBendUnderftela* 
*m IIStr.tchUnd.rR.la* 
••m HOverPredict 
* mnNodsCounl 
•>m nParticleCount 


0*0000000002bdf700 (m_Fitter=|m.GroupPairs=0*0000000Q02 
0.00000000110364 f9 (m nFlagi=40O9750271 m nNodaCount 
0.000000000 
0000000000 
0.000000000 


0x000000001 1266300 pos*! 6876.29443, 676997656, K 
0*0000000011266840 l.*(NDOOOO. 14INDOOOO, lalNDOOOO 
0*000000001 1266920 14INOOOOO, 1* IN 00000. 1.80900000 



■ - pSimulationWorldTransforms.l 


0*0000000011266300 Ipos-I 687629443 6769 97656. 364.53 
pos=< 6876.29443, 6769 97656, 364.533264) *=( 0.231891751 
pos=( 685443750, 669925000. 336452576) *=(0 231891751, 
po*=( -6866.77510, 673901172, 352.239136) *={0.231891751. 
pos=< 68% 30566. 672100684, 412.660950) *=( 0.621897459 
pos=( 690390430, 6718,92432, 419238647) *=( 0.795030117 
pos=( 6892 56396, 6721.99951, 406815186) *=< 0.S76003432 
pot=( 6877 71094, 6774 56104. 366353394) >.[ 0.231891751 
poi=( 686503955, 6733.55664, 350.073364) >=(0.231891751. 
po»=( 6915 52539 671527734,420790619).-; 0 745445311 
poi=( 6862.48145, 672527881. 346.786865) *=(0.231891751. 
po$=( 692521631. 671195996, 417005035) *• I 0.674079120 
pos=( 6882 99365, 6791,65576. 373.140533) *=( 0.231891751 
pos={ 688722559, 6805 36816. 378716858)*=! 0.231891751 
poi=< 6899 58838, 684537256. 394.600006) *=| 0.231891751 
0*0000000057960340 lpos=( 6904.11133, 671047217, 325 63 
pcj=( 690411133, 6710.47217. 325 638428) *=(0772891467. 
pos=<- 6904.11133. 6710.47217. 325.638428) *=(0772891462. 
0*000000005 79603a0 |0*00000000579603a0 10.777891462. 0.3 
pos=( 6901.78857, 671184180. 349 343170) *=(-0.021016858 
poj.( 68% 30273, 6697.04053. 322 380066) *.{0372540712 
pos=( 690772754. 672545947, 321.857361) *={ 0.086953304 
pos=( 690204639, 671183691. 364 934204) *.(00402162559 
po»=( 690082715, 671228955. 380599396) *=(005785267 
pos=( 690003711, 6714.20801. 391.881500) *=(0.610547960, 
poj=( 689298975, 6711.08350, 381 304016) *=(0.299724787 
pos=< 689902979 6720.03516. 380205017) . =(0599534094. 
pos=( 689017285, 6721.56641, 397.602539) *=(0.798212171, 
pos=< 6879.02051, 671674902, 393418640) *=(0.136251017. 
poj=< 688349463, 6730 69482, 392.627655) *={0.263435274, 
pos=< 6892.62842, 672141406. 408810089) *=( 0.579993010, 
po*-( 6884.52002, 672333447, 396799957) *=(0256185055, 


■n.DebugNanw 
■n pWotld 
n pFsModel 
n IIBandUnderRela* 
n.flStretchUnderRela* 


0*0000000002bd(700 |m Filt*t=(m GioupPa.<i=0*0000000002 
0*0000000011038419 |m nFlags«4009750271 m nNodeCount 


*> mnNodsCounl 
« m.nPartklaCount 
■ *• m.pPartide* 

• « m pPosO 
' - m pPoil 


0*0000000011266300 pos=( 687629443, 6769.976S6. 364 53 
0*0000000011266840 1 OINDOOOO, 1 »1NDOOOO, 14IND000C 
0*0000000011266920 lalNDOOOO, lalNDOOOO. lalNDOOOO 


The usual debug experience consists of sifting through 
through lines of code, imagining our 3D world in our 


Watch windows, stepping 
mind's eye.. 


Improved Debug Experience 


» mpPartlcla»,l4 0*0000000011266300 (po*«( 6876-29443. 6769.97656, 3< 

• * 10) pos=( 6876.29443, 6769.97656, 364 S33264) *=( 0.231891751 

- : poi-l 685443750. 669925000. 336452576) *-(0231891751 

' • |2) pos=(-6866.72510. 6739.01172, 352239136) *=(0231891751. 

• *131 pos=( 6896.30566, 6721.00684, 412660950) *.( 0.621897459 

• • |4| pos=( 6903.90430, -6718.92432. 419238647) *=(-0.795030117 

‘ -(51 pos-( 6892.56396, 672199951. 408815186) *.( 0576003432 

' • |6| pos=l 6877.71094. 6774.56104, 366.353394) *=(-0231891751 

• -171 pos=( 6865.03955, 6733.55664, 350.073364) *.(0231891751, 

poi=| 6915.52539. 6715.27734. 420.790619) *=( 0 745445311 

• • |9] pos=( 686248145, 672527881, 346.786865) *=(023189175L 

• • |10| po»=( 6925.2163L 671195996, 417.005035) *=( 0.674079120 

■ > |11| pos=( 6882.99365 6791.65576, 373.140533) x=( 0.231891751 

• * 112| poj=( 6887.22559. 6805.36816, 378 716858) *•( 0.231891751 

• • |13| pos=( 6899.58838. 6845.37256, 394600006) *=l 0231891751 

- pStmulattonWoHdT raniformv 14 0*0000000057960340 |po«=( 6904.11133. 671047217, 325.63 

• |0| pos=( 6904.11133, -6710.47217. 325638428) *=(0.772891462 

- * matri*3x4_t pos-( 6904.11133, 6710.47217, 325 638428) *.(0772891462 

■ • m (IMalVal 0<00000000579603a0 |0*00000000579603a0 10 772891462 0. 

' 111 po*.( 6901.78857. 671194180 349.343170) *=( 0.021016858 

' |2) poi=( 6896.30273, 6697.04053, 322.380066) *=(0372540712 

' |3i pos=( 6907.72754, 672545947. 321857361) *=( 0086953304 

• - 14] pt»=( 6902.04639. 671183691, 364.934204) <=(0.0402162559 

• 15) pos=( 6900.82715, 671228955. 380.S99396) *=(00578S26743 

• 16] pos=( 6900.0371L 671420801, 391.881500) *=(0610547960. 

(71 pos=( 6892.98975. 6711.08350, 381.304016) *=(0299724787. 

' • J8! po»=( 6899.02979, 672003516 380205017) *=(0399534094. 

• * |9] pos=( 6890.17285 6721.56641, 397 602539) <=10 798212171, 

‘ |10| po»-( 6879.0205L 671674902. 393418640) *=(0 136251017. 

: I pos=( -6883.49463. 6730.69482, 392.627655) *=(0.263435274, 

• |12| pos-l 6892 62842 672141406, 408810089) *=( 0.579993010 

• (13] poi=( 6884,52002 6723.33447, 396.799957) *=(0256185055, 


• «m ObugName 

• ■'m.pWorld 

• •'inpfaModal 

■» m fIBendUnderReia* 

■v m (IStratchUndarRala* 
••m flOvcrPredict 
* fnnNodeCount 
■»m nParbdeCount 


0*0000000002bdf700 (m_Fi(tef=|m.GroupPairs=0*0000000<X)2 
0*00000000110384(9 (m nFI«gi=40097S0271 m nNodaCount 
0.000000000 
0000000000 
0.000000000 


0*0000000011266300 pos* I 6876 29443, 676997656, K 
0*0000000011266840 l.*(NDOOOO. 141NOOOOO. H1ND000C 
0*000000001 1266920 1*0400000, 1*0400000, 1 *0400000 




Ideally, I want to see the world just like I did in the Physics Testbed, but I also want to see it 

ch atf^^||j^si in the Physics Testbed, but I 

i wsaJsta warrti (tQ(6®e airtocDh a ttigt ijigyjbiet wbttm&oasnlcSteprtteptn^h mytbreisikpxaiPtsii for me. 

Thai That 

would be ideal for me. 

That would make debugging much more comfortable, enjoyable and 
productive. 


Debugging Visually 



Let me show you what I mean. 

As you can see, this is the place with too many box casts that I discovered in the 
previous demo. 

I place a breakpoint in the box trace function. Every time the it hits, I'm looking at the 
3D world. I'm seeing the very last box cast. 

This lets me step through bunch of casts, visualizing each of them in Physics 
Testbed. 

This whole thing happens inside of a frame. 

Please note that the game didn't send any data packets to Physics Testbed. It's 
casting boxes in a very tight loop. The testbed is effectively acting like a watch 
window. 



Reading Memory Directly 



The method that lets us do it is ReadProcessMemory API. 

The Physics Testbed, acting in a way like a debugger, reads memory from the game 
without the game knowing it. I call this snooping. There is no way for the 
debugger to alter the game state, which is a nice benefit. 

It works when process is stopped in debugger 

There is no activity required from the game side besides a mutex and advertising 
where to start the search (the root data structure). Both are trivial to implement 
and are not intrusive. 
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ReadProcessMemory performance 



The biggest question when I thought of this idea was whether it's fast enough. 
So I wrote a small benchmark. 

It is slow to read every 4- or 8-byte field across the Process boundary 
Every call to ReadProcessMemory descends into kernel, performs a syscall — 
it costs you at least lus, limiting your bandwidth. 

But just copying 4 bytes in-process takes a thousandth of that time. 



ReadProcessMemory performance 



I simply implemented software cache layer that reads at least 4Kb page every 
time you snoop a byte and caches that page 

You always know you can read the 4KiB page if you can read a single byte of 
it. There will be no memory protection faults. 

This cache helps a lot with reading C NULL-terminated strings, for which 
length is unknown until you read the whole string one byte at a time. 



Traversing Data Structure 



So, we can snoop the bytes from the game physics, but how do we know what to do 
with them? 

In fact, walking the memory of the other process is very much like serialization. If you 
know your data structures, you can traverse them in another process using 
ReadProcessMemory just the same as you traverse them in the same process 
when you serialize them. 




Snooper and Serializer 


C++ Classes 


Clang 


Snooper Code 



Serializer Code 


We can still use Clang to understand our data structures. 

We can use all our existing code as input and spit out the snooper routines. 


Snooper: Code Generation 


C++ 


Clang 


C++ 


#include 

class CRncapsuleShape : 
public CRnShape 

public: 

private: 

vector m_vCenter[ 2 ]; 
float m_flRadius; 
AUTO_SERIALIZE; 


void CRncapsuleShape: :Snoop( CRnSnooper*pln , 
const CRncapsuleShape *pLocalCopy ) 

CRnShape: :Snoop( pin, pLocalCopy ); 

m_flRadius = pLocalCopy->m_fl Radi us ; 
for( int nElement = 0; 

nElement < 2; ++nElement ) 

{ 

: : SnoopC pin , 

&pLocalcopy->m_vcenter [nEl ement] , 
m_vcenter [nEl ement] ); 

} 

> 


Just like with the serializer, we are transforming C++ code into additional C++ 
code for snooper 

Something is becoming clearer now. Generating code as a bunch of printf() 
statements would really be awkward. Because we now have to print out 
serialize, unserialize, snoop and possibly collect statistics routines. 


Snooper: The Same Json 


C++ 


Clang 


json 


#i ncl ude 

class CRnCapsuleShape : 
public CRnShape 

{ 

public: 

private: 

vector m_vcenter[ 2 ]; 
float m_flRadius; 
auto_serialize; 

}; 


"CRnCapsuleShape" : { 
"fields" : { 

"m_flRadius" : { 


"typeName" : 

} i 

"m_vCenter" : { 

"float" , 

"className" 

"Vector" , 

"arraySize" 
}, } 

"bases" : [ 
"CRnShape" 

}, ] 

2, 


So, we still generate the same json file as we did for serialization. 


Snooper: Different StringTemplate 


C++ 


Clang 


json 


StringTemplate 


C++ 


void <name>: : Snoop ( CRnSnooper*pln , 
const <name> *pLocalCopy ) 

<cl ass. bases : {b | 

<b>::Snoop( pin, pLocalCopy );}> 

<cl ass . f i el ds . keys , cl ass . f i el ds . val ues : 
{k,v | 

<snoop_f i el d(name=k , props=v)> 

} 

> 

<i f (cl ass . postini tMethod)> 

AfterRestore( pin ); 

<endif> 


The StringTemplate for the snooper looks like this. It's a different string template, but it's 
small. All my snoop code is generated from a 200-line template. These are a few of those 
lines. They auto-generate the code to snoop most classes in our physics engine. 

You feed it the same json file as for the serializer. It spits out snooper C++ file. It's that simple. 


Snooper: Generated Code 


C++ 


Clang 


json 


void <name>: :Snoop( CRnSnooper*pln , 
const <name> *pLocalCopy ) 

cclass. bases : {b | 

<b>::Snoop( pin, pLocalCopy );}> 

<cl ass.fi ei ds . keys , cl ass . f i el ds . val ues : 
{k,v | 

<snoop_fi eld(name=k, props=v)> 

} 


<i f (cl ass . postlni tMethod)> 

AfterRestoref pin ); 
<endif> 


StringTemplate | 


C++ 


void CRnCapsul eShape: :snoop( CRnSnooper*pln, 
const CRnCapsul eSnape *pLocalCopy ) 

CRnShape: :Snoop( pin, pLocalCopy ); 

m_flRadius = pLocalCopy->m_fl Radius ; 
for( int nElement = 0; 

nElement < 2; ++nElement ) 

{ 

: : SnoopC pin , 

&pLocalCopy->m_vCenter [nEl ement] , 
m_vcenter [nElement] ); 

} 

AfterRestore( pin ); 


This here is the template and the resulting generated C++ code for snooper. 

I'm snooping the whole struct into a local copy. Then I read out all the plain data 
elements. 

All the non-plain data elements are recursively snooped by other overloaded Snoop 
functions. 

The template is smart enough to know where I have arrays, templates, pointers to 
follow and so on. 


Snooper: Generated Code 


C++ 


Clang 


json 


StringTemplate 


C++ 


void ::Snoop( CRnSnooper*pIn , 

const "'pLocalCopy ) 



This here is the template and the resulting generated C++ code for snooper. 

I'm snooping the whole struct into a local copy. Then I read out all the plain data 
elements. 

All the non-plain data elements are recursively snooped by other overloaded Snoop 
functions. 

The template is smart enough to know where I have arrays, templates, pointers to 
follow and so on. 


C++ 


Snooper: Generated Code 

— *| StringTemplate | 


Clang 


json 


C++ 


void <name>: :Snoop( CRnSnooper*pln , 
const <name> *pLocalCopy ) 


void CRnCapsul eShape: :snoop( CRnSnooper*pln, 
const CRnCapsul eSnape *pLocalCopy ) 


This here is the template and the resulting generated C++ code for snooper. 

I'm snooping the whole struct into a local copy. Then I read out all the plain data 
elements. 

All the non-plain data elements are recursively snooped by other overloaded Snoop 
functions. 

The template is smart enough to know where I have arrays, templates, pointers to 
follow and so on. 
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C++ 


Snooper: Generated Code 

— *| StringTemplate J 


Clang 


json 


C++ 


void <name>: :Snoop( CRnSnooper*pin , 
const <name> -pLocalCopy ) 

<class. bases : {b | 

<b>::Snoop( pin, pLocalCopy );}> 


void CRnCapsul eShape: :Snoop( CRnSnooper*pln, 
const CRnCapsul eSnape *pLocalCopy ) 

CRnShape: :Snoop( pin, pLocalCopy ); 

} 


} 


This here is the template and the resulting generated C++ code for snooper. 

I'm snooping the whole struct into a local copy. Then I read out all the plain data 
elements. 

All the non-plain data elements are recursively snooped by other overloaded Snoop 
functions. 

The template is smart enough to know where I have arrays, templates, pointers to 
follow and so on. 


Snooper: Generated Code 


C++ 


Clang 


json 


StringTemplate J 


C++ 


void <name>: :Snoop( CRnSnooper*pin , 
const <name> -pLocalCopy ) 

<class. bases : {b | 

<b>::Snoop( pin, pLocalCopy );}> 


void CRnCapsul eShape: :Snoop( CRnSnooper*pln, 
const CRnCapsul eSnape *pLocalCopy ) 

CRnShape: :Snoop( pin, pLocalCopy ); 


m_flRadius = pLocalCopy->m_flRadius; 

<cl ass . f i el ds . keys , cl ass . f i el ds . val ues : 


for( int nElement = 0; 

{k,v | 

<snoop_f i el d (name=k , props=v)> 

} 


nElement < 2; ++nElement ) 

{ 

: : SnoopC pin , 

&pLocal Copy->m_vCenter [nEl ement] , 

> 


} 


m_vcenter [nElement] ); 

> 

} 


This here is the template and the resulting generated C++ code for snooper. 

I'm snooping the whole struct into a local copy. Then I read out all the plain data 
elements. 

All the non-plain data elements are recursively snooped by other overloaded Snoop 
functions. 

The template is smart enough to know where I have arrays, templates, pointers to 
follow and so on. 
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Snooper: Generated Code 


C++ 


Clang 


json 


void <name>: :Snoop( CRnSnooper*pln , 
const <name> *pLocalCopy ) 

cclass. bases : {b | 

<b>::Snoop( pin, pLocalCopy );}> 

<cl ass.fi ei ds . keys , cl ass . f i el ds . val ues : 
{k,v | 

<snoop_fi eld(name=k, props=v)> 

} 


<i f (cl ass . postlni tMethod)> 

AfterRestoref pin ); 
<endif> 


StringTemplate | 


C++ 


void CRnCapsul eShape: :snoop( CRnSnooper*pln, 
const CRnCapsul eSnape *pLocalCopy ) 

CRnShape: :Snoop( pin, pLocalCopy ); 

m_flRadius = pLocalCopy->m_fl Radius ; 
for( int nElement = 0; 

nElement < 2; ++nElement ) 

{ 

: : SnoopC pin , 

&pLocalCopy->m_vCenter [nEl ement] , 
m_vcenter [nElement] ); 

} 

AfterRestore( pin ); 


This here is the template and the resulting generated C++ code for snooper. 

I'm snooping the whole struct into a local copy. Then I read out all the plain data 
elements. 

All the non-plain data elements are recursively snooped by other overloaded Snoop 
functions. 

The template is smart enough to know where I have arrays, templates, pointers to 
follow and so on. 


Full Pipeline 


C++ 


Clang 


json 


StringTemplate 


C++ 


#i ncl ude 


ASTConsumer 


{ 


void <name>:: 


void Shape: : 

"physics. h" 




"fields": . . . 

} 


SnoopQ . . . 


SnoopQ . . . 


Just to bring the point home, here's how the full code parsing and 
generation pipeline looks in our case 

Parsing C++ is probably the most intimidating thing in this case. But after you spend a day or 
two learning Clang's API, you might find it rather intuitive. 


Example: Snooping Joint Stack 



Let's see a little demo of what we have as a result. 

I configured a projection-type solver of a chain of hinge joints. As you can see, small 
chains solve fine, but larger chains start to break constraints. 

I wanted to see how the solver behaves inside the iteration loop. 

So I put a breakpoint where each joint is solved. 

After each breakpoint hits, I switch to the Testbed and snoop. I have two testbeds 
here, one debugging the other. Notice that one is unresponsive - that's the one 
that I'm debugging. And another one acts like a watch window. 

So what we have here effectively a visualization of the internal loop of a projection 
constraint solver. 



Fixing stuff, whether it's yours or not, is unavoidable. At least if you make anything 
complex. 

The faster you can drill down to details, the more efficient you are while debugging 
problems, the more time you spend developing and not fixing stuff. Once you 
make debugging lOx faster and more convenient, there's a quantum leap : you 
can develop features with complexity that would otherwise be infeasible to 
develop. 

That's why it's important to make the process as seamless and convenient as 
possible. It pays with higher productivity in the end. 

There are a few more technical details, though 



Divide and Conquer 



Our Testbed is a tool for divide-and-conquer debugging. It makes it quick and 
painless. 

For example, we have a collision filtering system, and time to time something is not 
colliding right in the game. We just fly around, click on objects that don't collide 
and drill down to their flags. In most cases we see exactly what's happening 
within seconds, and you don't have to be a programmer to do that. 

Another example: we had an incredibly slow physics. It worked fine, but was very 
slow. All contacts looked correctly. So we snooped and clicked the objects that 
were awake. Their contacts looked fine. It took us some seconds to realize there 
were too many contacts on them. 



Mutex in game loop 



One consideration here is thread safety 

While you cannot use the Watch window in VS IDE while the game process is 
running, you can snoop. 

Use Mutex to guard the physics step routine 


SEH 




try { . . . } except ( ) { } 

Snoop 

t 1 

Mutex 






Snooping a running process is tricky: data structures are in flux while you are 
reading them. 

It's impossible to crash the game by reading data from it. But the snooper 
might crash if you snoop an inconsistent memory. I'm just using 

( try... except) to catch that. Strictly speaking it's not a safe solution, but it 

makes the experience smoother. 


SEH != try. ..catch 


try { 


try { 

} except ( ) { 

! = 

} catch (...){ 


■ 


} 


} 


*** Also, neither is necessary 


Just to be clear 

Structured exception handling is not the same as C++ exception handling. 

This goes outside the topic of this talk. 

In any case, you don't really have to use either of these. 

I just find it nice to wrap my sloppy debugging tool code in case I happen to snoop 
bad data structure. 

If I released it as a commercial product, I'd expend the extra effort to detect that 
memory is unreadable and not crash. 



When the game is stopped on a breakpoint, just ignore the Mutex. Or time it 
out. 

Data structures cannot change when the game is frozen, so there's a high 
chance snoop will succeed if they are consistent. It depends on where you put 
the breakpoint. 


Clang Annotations 


# define clang_attr(attr) \ 

attribute ( ( annotate( ATTR ) ) ) 

#def i ne SERIALIZE_ARRAY_SIZE( SIZE ) \ 

CLANG_ATTR( "ar ray_si ze : " #SIZE ) 


class CSomeClass 

{ 

uintl6 *m_pNodeToCtrl 

SERIALIZE_ARRAY_SIZE( m_nNodeCount ); 
uint!6 m_nNodeCount; 


bool HasAnnotatedAttr( 

const clang: :Decl *pDecl , 
const char * psubstr ) 

if( clang: :AnnotateAttr *pAnnotate = 
pDecl ->getAttr<clang: :AnnotateAttr>() 

} { 

ShortStringRefvector attrs; 
pAnnotate-> 

getAnnotationO .split( attrs, " " ); 
return std::find( attrs . begin () , 

attrs. end(), psubstr ) != 
attrs. end() ; 

} 

return false; 


Sometimes we need some annotations to our code. E.g. for each pointer you 
serializer will need to know how many objects it's pointing to. It's easy to put 
this in (see the left slide). 

And ot's easy to add and read those annotations in the parser ( see the right 
slide ) 
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Complex Case: Polymorphism 


struct VtableRecorcLt 
{ 

template <typename T> 

void lnit( CUtlStringToken name ) 

{ 

T* pObject = new Tj 

m_nVTable = *( ( uintp* jpObject ); 

delete pObject; 

m_nClassName = name.m_nHashCode; 

} 

uint64 m_nVTable; 
uint64 m_nClassName; 

>; 


I'd like to talk about a couple of less trivial cases, like classes with Vtables. 

I auto-generate a function that creates an instance of each class with Vtable, 
and copies its Vtable point into a known place (an array of uintptr t). 

The snooper can then read the array and use it to recognize class type by its 
vtable pointer. It's like RTTI that works across process boundaries. 


Complex Case: Polymorphism 


void Ini tvtabl eRecord( CUtlVectorcVtabl eRecord_t> *pTable ) 

<data. classes. keys, data. classes. values: {k,v | <register_class(name=k,class=v)> }> 


registered ass (name, cl ass) ::= « 

<i f (cl ass . i sLeaf )> 

( *pTable )[ pTabl e->AddToTai 1 () ] .init\«name>\> ("<name>" ); 

<el se> 

<i f ( ! cl ass . i sBase)> 

// <name> is neither leaf nor base, no need to register it for auto-recognition 
<endif> 

» 


This is how I generate that code with StringTemplates 


Complex Case: Polymorphism 


void Ini tvtabl eRecord( CUtlvectorcVtabl eRecord_t> *pTable ) 

( *pTab1e )[ pTabl e->AddToTail () ] .lnit<CRnWeidloint>( "CRnweldloint" ); 

// CRnShadowController is neither leaf nor base, no need to register it for auto- recognition 
( *pTabie )[ pTabi e->AddToTail () ] .lnit<CRnSpringloint>( "CRnSpringloint" ); 

} 


And this is what the generated code looks like. One line is generated for every 
class with vtable. It’s done automatically, so you don’t have to remember to 
add a class to the list once you add a virtual . 

You don’t have to remove a class when you remove the last virtual from it 
either. 


More Complex Cases 


Compound data structures 


Automated CodeGen 

. World Root 


These data structures change frequently 

• Rigid Bodies 


Easy to generate code for 

. Joints 


Easy to mark up with clang annotations 

. Contacts 



. etc. 




Primitive data types and special cases 


Write code by hand 

. float, int and other PODs 


Change infrequently 

. Simple types without pointers 


Sometimes hard to generate code for 

. std::vector<> and external containers 


Building blocks for compound data 

. Special containers 




For some data structures, it's just easier to write serialization code by hand. The 
code generator should know about leafy data types like int and float, and 
generate serialization code automatically. There's no problem with structures only 
consisting of those data types, those can be simply copied directly from another 
process address space. 

It's a bit more involved with pointers. In general, C++ does not provide enough 
semantics for you to know if a pointer points to one object, or a vector of objects. 
I'm using clang annotation to add that semantics. So, the code generator looks at 
that annotation and computes the size of the array and serializes accordingly. 

We don't use multiple heaps for our physics engine, but if you do, you'll have to 
provide semantics about how to allocate the snooped or unserialized data 
structures. 

In strange cases, like an int that is a pointer, but the lower 2 bits have a special 
meaning, you probably want to write that routine by hand. Also when you have a 
union and have to run code to decide what it means. 

You have a choice whether to make your code auto-generator smarter (and harder 
to maintain) or snoop a specific class manually. It's a judgement call. 


Statistics 


C++ 


Clang 


json 


StringTemplate 


C++ 


Non-trivial data: 

46 classes 
320 fields 


***AII generated code is typo-free 


We currently have 46 non-trivial classes with 320 fields for which serializer and 
snooper code is generated. 


Statistics 


C++ 


Clang 


json 


StringTemplate 


C++ 




Parser: 








*■ 

1000 lines 





* 



***AII generated code is typo-free 


The parser that uses clang is about 1000 lines long. I probably spent a couple 
of weeks part time writing it, and I've been using it for a year. 


Statistics 


C++ 


Clang 


json 


StringTemplate 


C++ 



***AII generated code is typo-free 


The generated json is 2800 lines long 


Statistics 


C++ 


Clang 


json 


StringTemplate 


C++ 








Snoop template: 
200 lines 









Serialize template: 
400 lines 




***AII generated code is typo-free 


Snooper template is 200 lineslong . 
Serializer template is 400 lines. 


Statistics 


C++ 


Clang 


json 


StringTemplate 


C++ 










Snoop code: 

2500 lines 









Serialize code: 
5600 lines 


***AII generated code is typo-free 


Auto-generated snooper is 2500 lines long. 

Serializer is 5600 lines long, because it includes 3 directions: serialize to 
stream, unserialize from stream, count number of bytes used (very similar to 
Serialize, but actually counts allocated memory). 

It takes 2-8ms to snoop a typical game frame, which is normally maybe 10 mb 
large 


Physics in Games: Integration 



Every game physics engine eventually gets integrated into a game. It makes 

change-compile-test iteration much longer. Especially when artists start building a 
huge map, and new exciting bugs show up that only happen in this huge map 
once in a blue moon. 

It makes bug hunting mind-numbingly slow. Because the game code is much, much 
larger than the physics code we know so well. 
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Physics in Games: Perspective 



Game code is written by many people over many years. Physics engine is self-contained 
and neat, almost miniature in comparison. 

At valve, the sheer amount of logic in the old game code with history is very large. 



It's really easy to lose perspective when you work on physics for a long time. 

The ultimate goal of physics in a game is to make the game better. It's not to make the best 
or fastest or true to life physics simulation. That is why in-game debugging and 
optimization is very important. 

If the game wants to cast a lot of rays, we need to optimize that first. Games don't generally 
have stacks of cubes everywhere, so that's not the best benchmark for a game physics 
engine. It's best to put it in the game and debug and profile it in the game, if you want it 
to perform well in the game. 
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Game Data Visualization 



Run Generating_Nav 

Here's another example of an unexpected benefit of visualizing everything in physics engine. 
Testbed records and displays all traces. And Ieft4dead navigation system does a lot of 
traces. So, when connected to the game, Testbed visualizes an Al algorithm. 




I suggest we can all spend a couple of weeks up front to make our life more 
comfortable and deliver a better product. Sometimes you can't do that, but 
generally if your time horizon is years, you can afford a couple of weeks. 

What I'm describing is pretty general technique. Game physics is just an example, 
but it's usable in any complex software. 



Debugging Visually 



euno qesuj ru«) 20 ou 


<dU«ZE ttHDOH C*'.P 


-lie: * t * * 


iB_veloci tyfiuffer [ 2 • nBody ♦ 0 ] » pBody->Integrateforces( vGravity, dt ): 
pBody-»C 1 earf orcesO ! 

«_VeToci tyBuffer[ 2 * nBody * 1 ] = pBody->IntegrateTorques( dt ); 
pBody-»C 1 earTorques O ; 


f ( nContactCount > 0 ) 

ISlA*ID_PROf 11E( Preparccontact* ); 
int i*4anifoldCount = 0; 

for ( int nContatt - 0; nContact < nunContactCount; • 


rWanifoldCount a 

) 

■_rptanifo1dCount = r 


if C nJotntCount > 0 ) 

( 

ISLAND PROFILE ( PrepareJoints ); 

uint rOoi ntConstratntBufferStae « nJointCounl • CRnJol ni : :GetMaxConstralntBuf ferSiaeO; 
pJointConstraintBuffer » new tiinf8[ nJointConst raintBuf ferSiae J; 

ui nt8 • pJointConst raintBuf ferEnd = PrepareJointsC nJointCount, pJointConstraintBuffer , l.Of / 
NOIE_UNUSEO( pJointConstraintBufferEnd ); 

Assertobgf pJointConstraintBufferEnd <» pJointConstraintBuffer * n JointConst raintBuf fersiae ) 



.Contacts. OnOOOOOOOOWdMJaS IOiOOOOOOOOO»rS4dO (m. CRnContact *{2821 
|0| 0»OOOOOOOOOS9e54<X> |m_Cadie.|m.Simple*Ca< CRnContact • 

• (CRn< tm.Cacne = |m.Sim©*e»CacHe= |m.ttMetnc= 1.401 CRnComerContact 
<* CRnO |m pShape-0«00000<»0059e54d8 IChOOOOOOOOt CRnOmrtAppingPair 

• mjtC 9 Int 

•k m.nT CONTACT.CONVEX (0) RnContactType.t 

a, m.nf 0 int 


m.Mi [11 


CUtIVectot < Mamfok 


«■ m.TC (m.flAiptia- 1.00000000 m.nMandold-1 1 RnTOIEvwit.t 
[1| 0»OOOODOOOOS9eVMO Im.CacNMro.SimpleKCa. CRnContact • 

«* [CRitC lm.Caehe-lm.Simpi*icCache*|m II Metric. 1401 CRnComenContect 

• CRnO Im jtSOape =OxOOOOOOOOOS9eS448 lOnOOOOOOtXX CRnO^nappingPaa 

• m.nC 8 int 

•k m.nt CONTACT.CONVEX (0) RnContacttype.t 


«k m.Mi [1| C Otl Vector -MarofoK 

•i m.nl 0 int 

•k m.TC [mJIAlpna* 1.00000000 m.nMamfold * • 1 1 RnTOIEventj 

|2[ 0*00000000059e5jb0 (m_Cadie.|m_SimpteiCa. CRnContact ' 

• (CRr* (m_Cache=lm_Sinic»e>Cache=lmJ!Metnc=l401 CRnComenContact 
« CRnO [m J>stupe-0>0000ao00059e33b8 lOaDOOOOOOOt CRnOmrleppingPair 

• rtUlC 7 int 

•k m.nl CONTACT .CONVEX (0) RnContaOType.t 

u, m.nf 0 int 


«. m.M, [1] 

•A m.nT 0 

u, m.TC Im.nAJpra • 1.00000000 m.nMamtold ■ - 1 1 


«, mjiT CONTACT.COMVEX (0) 


m.TC Im.flAtpftaa 100000000 m.nManilold « : 


ClfUVector < MamloR 
RnTOIEvent_t 


COIIVector <MamfoR 
ml 

RnTOIEvent.t 


Eventually, I want my Visual Studio to look like this when I'm debugging. 

You can make a debugger extension with VS Extensibility, and I might make one 
some day. But when I experimented with it, it was very inconvenient to write a VS 
extension of such complexity. 


AutoExp.dat, *.natvis 


[AutoExpand] 

Vector =x=<x,g> y=<y,g> z=<z,g> 

Quaternion= x=<x,g> y=<y, g> z=<z,g> w=<w, g> 

[Visualizer] 

CStrongHandle<*> { 

preview ( $c .m_pBinding->m_Name->m_ResourceNameSymbol . u . m_pAsString ) 

children ( 

#( 

Data: ($T1 *) ($c.m_pBinding->m_pData) , 

[raw members] : [$c, !] 

) 

) 

} 


I hope everyone knows about autoexp.dat. It lets you visualize in text, so to say. 

I strongly suggest you just put all your data structures in there. Try to make them 
more readable - in your Watch window. It helps a lot. 


Organizing Data Structures 



A word about organizing your data strucutres. 

It's most convenient to have a root object from which you can crawl all your data. 

I just gather all the globals in one object, and only publish a pointer to that one object 
to the snooper. 

Physics will have some data structures. You'll parse and process them. 

You'll probably have some external pointers, like pointers to the vertex and index 
buffers for debug drawing. 

You can write routines for snooping and serialization of those things manually 


Out-of-Game Visualization 


Game Physics World 



Bit Stream 


10010110010110... 



Another thought. Visualizing in an external app doesn't have to use 
ReadProcessMemory API. 




Using TCP/IP 


Game 





All the leading physics engine have some sort of over-TCP/IP visualization. 

Unfortunately it doesn't work when the game is stopped in debugger, frozen or 
crashed. Unless you stream your game state all the time, and that spends CPU. If 
you don't know where the problem is, and stream a lot of detailed data, it really 
takes a lot of CPU. And you have to remember to turn it on. 

But it works on remote machine. So it's a viable option. 




Using Shared Memory 


Game 



Shared Memory 


I- Of 74 3560 OOOOOOOl OOOOOOC 

; 00000000 00000000 Of 7 3eal 

in Of 7434 e8 OOOOOOOl iWr r ‘ 

i n Of7434b8 OOOOOOOl 0290c2c 
i- ii n 00000080 OOOOOOSb 029514* 

".I"" I I | I OOOOOOOl 0000(1000 (>57470. 

m i Of 74 3850 OOOOOOOl OOOOOOC 

"•oi ii mi lii' 1 1 1 Of 76a 380 OOOOOOOl Of 7b61c 

H>| ' Of 74 38 fO OOOOOOOl OOOOOOC 

>l"i I' Of 7 3c 7 88 OOOOOOOl 04b’37t 

1 0*0000000 1CK 1161*0 00000000 00000000 OOOOOOC 

o.nooooixnor 'noun 00000000 00000000 OOOOOOC 
■ I Bt'l!i OOOOOOOl 00000000 Of 74 If ? 

I Ill'll H' HU i 0f76a420 OOOOOOOl Of7b61c 

0,00000001 Or •H6270 00000002 00000000 OOOOOOC 

I mill I'll I I Ml 04bb4da0 00000000 OOOOOOC 

' I ' Of 76a 380 OOOOOOOl Of7c77c ' 

'l'i M Of 74 3f 90 OOOOOOOl 000040C 

Of 7 3c 788 OOOOOOOl 04b737c 
00000000 00000000 OOOOOOC 
OOOOOOOl 00000000 04bb4d< 
la, in mi mi -in r: i- 04»d7678 nnnonono nnoonor 



An alternative to TCP/IP that works when the game is stopped in debugger ( or 
frozen or crashed ) would be to use shared memory. 

Win32 API allows processes to share memory directly. One way is to use a memory- 
mapped file, but you need to dedicate a region of memory for sharing, and some 
of the memory would be passed down from the game (like pointer to collision 
resources), and it would be very inconvenient to make sure all of that is allocated 
in the shared memory region 
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Game-centric Viz 



Hopefully we'll eventually be able to save/load/rewind the whole game with this 
method. Our game code base is much bigger than physics engine though, so 
that's a lot of work. 




Serialization 



class Contact_serialized 
{ 

int m_nShapelndex [ 2 ]; 
int m_nNextlndex[ 2 ]; 

}; 


10010110010110... 


We are using the same framework to generate serialize/unserialize code. 
Every time we change our data structures, we just re-run the parser/generator, 
and it's all updated 

We serialize/unserialize a byte stream. But another variant would be e.g. to 
generate a reflected version of all your data, with indices instead of pointers. If 
you have reflection API, it will let you version the serialized data 
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Techniques: The Summary 


Technology 


Technique 

. Clang 


. Detailed Statistics 

• String Template 


. Serialization 

• ReadProcesMemory API 


. Streaming 



. Snooping 



. etc. 


To summarize, we were talking about a few uses of two technologies: Clang 
and String Template, and one simple Windows API: ReadProcessMemory. 
Together, these technologies can enable very powerful and complete 
serialization, an interesting debugging technique I call snooping, gathering 
detailed statistics without tedious coding and without that code going stale the 
next time someone adds a new array somewhere. 

The same serialization template can also be used to send data over TCP/IP, 
and/or serialize into an alternative data structure with reflection API (e.g. into a 
series of dictionaries) that can be saved and loaded into newer versions of the 
engine, and many more uses. 

It is possible to use snooping to read and display every frame in your game 
You will probably skip some frames unless you take care to synchronize 
You will probably waste time at the mutex, for the game has to wait for the 
snooper to finish, and it's less efficient to snoop than to pack local game data 
and send it through a pipe or socket. 

It is much faster to let the game actively serialize the world delta every frame 
and send it over the wire. It will also guarantee no skipped frames. 

It is easier to just reuse existing serialization and send every frame over the 
wire 

For a quick fix, just see the biggest pieces of data (probably mesh 
descriptions) and cut them out of the stream if you already sent them 
When visualizing streamed physics, I tied the Vbs/IBs directly into the world 
data structures. 

This automatically recreated Vbs/IBs on every streamed frame 


To fix that, I had to cache those debug objects off and have special callbacks 
reuse them. I did it with clang's annotations. 


Pros & Cons 


Pros 


Con 

• Reliable Process 


• Implementation 

• Repeatable Debugging 


Complexity 

• Human Error Excluded 



• Neither Tedious nor 



Outdated Code 




Full-world serialization of this type includes everything, and has very high 
reliability. You don't have to worry about missing a thing or two: the 
parser/generator is unlikely to make a human error. 

This is usable to dump suspicious game frames for later examination, or 
streaming them. When debugging later, it's very much like debugging in-game, 
but it's repeatable and much more convenient. 

I don't like manual non creative work, and serialization code is always very low 
in creativity. Getting rid of it makes iteration faster and doesn't distract me from 
the creative process. 

Among the drawbacks, this method is more complex to start using it. You can 
still use the old methods in parallel with this method. 



Make work fun! It pays to do that. 
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